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Sensor  Integration  and  Joint  PDF  Construction 
for  Distributed  Detection  and  Classification 


Steven  Kay,  Fellow,  IEEE,  Quan  Ding,  Student  Member,  IEEE,  and  Muralidhar 

Rangaswamy,  Fellow,  IEEE 


Abstract 

With  multiple  sensors  in  distributed  systems,  one  is  expected  to  make  better  decisions  than  with  a 
single  sensor.  We  investigate  the  problem  of  sensor  integration  to  combine  all  the  available  information. 
In  this  paper,  we  propose  a  novel  method  of  constructing  the  joint  probability  density  function  (PDF) 
based  on  the  exponential  family.  This  method  does  not  require  the  knowledge  of  the  marginal  PDFs  and 
hence  is  useful  in  many  practical  cases.  We  prove  that  our  method  is  asymptotically  optimal  in  Kullback- 
Leibler  (KL)  divergence.  It  is  shown  that  the  performance  of  our  method  is  the  same  as  existing  methods, 
while  it  requires  less  information. 


Index  Terms 

Distributed  detection  and  classification,  exponential  family,  joint  probability  density  function,  Kullback- 
Leibler  divergence,  sensor  integration 


I.  Introduction 

Distributed  systems  and  information  fusion  have  been  widely  studied  and  used  in  engineering,  finance, 
and  scientific  research.  Such  applications  are  radar,  sonar,  biomedical  analysis,  stock  prediction,  weather 
forecast,  and  chemical,  biological,  radiological,  and  nuclear  (CBRN)  detection,  to  name  a  few.  If  the 
joint  probability  density  functions  (PDFs)  under  each  candidate  hypothesis  are  known,  we  would  easily 
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obtain  the  optimal  performance  by  the  Neyman-Pearson  rule  for  detection  (binary  hypothesis  testing)  and 
by  the  maximum  a  posteriori  probability  (MAP)  rule  for  classification  (multiple  hypothesis  testing)  [1], 
However  in  practice,  this  information  may  not  be  available.  This  usually  happens  when  the  dimensionality 
of  the  sample  space  is  high  and  we  do  not  have  enough  training  samples  to  have  an  accurate  estimate  of 
the  joint  PDF.  This  is  also  recognized  as  “curse  of  dimensionality”  in  pattern  recognition  and  machine 
learning.  Hence  it  is  important  to  construct  an  appropriate  joint  PDF  when  it  is  not  available.  One 
common  approach  is  to  assume  that  the  measurements  from  different  sensors  arc  independent  [2],  [3]. 
This  approach  has  been  widely  used  due  to  its  simplicity,  since  the  joint  PDF  is  then  the  product  of 
the  marginal  PDFs.  This  is  also  known  as  the  “product  rule”  in  combining  classifiers  [4].  In  spite  of 
its  popularity,  the  independence  assumption  may  not  be  a  good  one  if  the  measurements  are  actually 
correlated.  Furthermore,  as  stated  in  [4],  the  product  rule  is  severe  because  “it  is  sufficient  for  a  single 
recognition  engine  to  inhibit  a  particular  interpretation  by  outputting  a  close  to  zero  probability  for  it”. 
Hence  people  have  studied  other  methods  that  consider  the  correlation  among  the  measurements.  A 
copula  based  frame  work  is  proposed  in  [5],  [6]  to  construct  the  joint  PDF.  The  exponentially  embedded 
families  (EEFs)  are  used  in  [7]  to  estimate  the  joint  PDF  that  is  asymptotically  closest  to  the  true  one  in 
Kullback-Leibler  (KL)  divergence. 

Note  that  the  above  methods  all  require  the  knowledge  of  marginal  PDFs.  In  this  paper,  we  consider  the 
case  when  the  marginal  PDFs  are  not  available  or  accurate,  which  can  happen  due  to  high  dimensional 
sample  space  and  insufficient  training  data.  We  present  a  new  way  of  constructing  the  joint  PDF  without 
the  knowledge  of  marginal  PDFs  but  only  a  reference  PDF.  The  constructed  joint  PDF  takes  the  form 
of  the  exponential  family  and  it  incorporates  all  the  available  information.  The  maximum  likelihood 
estimate  (MFE)  [8]  of  the  unknown  parameters  can  be  easily  solved  based  on  the  properties  of  the 
exponential  family.  It  is  shown  that  the  constructed  PDF  is  asymptotically  the  optimal  one  in  the  sense 
that  it  is  asymptotically  closest  to  the  true  PDF  in  KF  divergence.  Since  there  is  no  Gaussian  distribution 
assumption  on  the  reference  PDF,  this  method  can  be  very  useful  when  the  underlying  distributions  arc 
non-Gaussian.  We  start  with  the  detection  problem,  and  then  extend  our  method  to  the  classification 
problem.  For  detection,  it  is  shown  that  under  some  conditions,  our  detection  statistics  arc  the  same  as 
the  the  clairvoyant  generalized  likelihood  ratio  test  (GFRT).  For  classification,  our  classifier  also  has  the 
same  performance  as  the  estimated  MAP  classifier.  Both  the  clairvoyant  GFRT  and  the  estimated  MAP 
classifier  assume  that  the  true  PDFs  under  each  candidate  hypothesis  are  known  except  for  the  usual 
unknown  parameters. 

The  paper  is  organized  as  follows.  In  Section  II,  we  introduce  a  distributed  detection/classification 


2 

Approved  for  public  release;  distribution  unlimited 


problem.  In  Section  III,  we  construct  the  joint  PDF  by  an  exponential  family  and  apply  it  to  the  problem 
in  Section  II.  The  KL  divergence  between  the  true  PDF  and  the  constructed  PDF  is  examined  in  Section 
IV,  and  the  result  shows  that  the  constructed  PDF  is  asymptotically  optimal.  Examples  for  distributed 
detection  are  given  in  Section  V,  and  examples  for  distributed  classification  are  given  in  VI.  Simulation 
results  to  compare  the  performance  of  our  method  with  existing  methods  arc  shown  in  Section  VII.  In 
Section  VIII,  we  draw  the  conclusions. 

II.  Problem  Statement 

Consider  the  distributed  detection/classification  problem  when  we  observe  the  outputs  of  two  sensors, 
T  |  (x)  and  T2(x)  which  are  transformations  of  the  underlying  samples  x  that  are  unobservable.  We  choose 
two  sensors  for  simplicity.  All  the  results  in  this  paper  are  valid  for  multiple  sensors.  For  detection,  we 
want  to  distinguish  between  two  hypotheses  Ho  and  Hi  based  on  the  outputs  of  the  two  sensors,  and 
for  classification,  we  have  M  candidate  hypotheses  Ht  for  i  =  1.2.....  M. 

Assume  that  we  have  enough  training  data  Ti.(x)’s  and  T24(x)’s  under  Ho  when  there  is  no  signal 
present.  Hence  we  have  a  good  estimate  of  the  joint  PDF  of  Ti  and  T2  under  Ho  [9],  and  thus  we 
assume  PTi,T2(ti,  ^2!  Ho)  is  completely  known.  Under  Hi  or  Ht  for  i  =  1,2, . . . ,  M  when  a  signal  is 
present,  we  may  not  even  have  enough  training  data  to  estimate  the  marginal  PDFs.  This  is  especially  the 
case  in  the  radar  scenario,  where  the  target  is  present  for  only  a  small  portion  of  the  time.  So  our  goal 
is  to  use  as  much  information  as  we  have  to  construct  an  appropriate  Pti.t2 (t  1 ,  ty;  7U )  under  Hi  for 
detection  or  PTi,T2(ti>  ^2;  Hi)  under  each  H,  for  classification.  A  simple  illustration  is  shown  in  Figure 
1. 

III.  Joint  PDF  Construction  by  Exponential  Family  and  Its  Application  in  Distributed 

Systems 

To  start  with,  we  consider  the  detection  problem,  where  we  wish  to  construct  PTi,T2(ti,  t2;  Hi).  The 
result  will  then  be  extended  to  the  classification  problem. 

Since  PTi,T2(ti,t2;Hi)  cannot  be  uniquely  specified  based  on  PTi,T2(ti>  Ho),  we  need  the  fol¬ 
lowing  reasonable  assumptions  to  construct  the  joint  PDF. 

1)  Under  Hi  the  signal  is  small  and  thus  PT!,T2(ti,  t2j  Hi)  is  close  to  PT1,T2(ti,t2',Ho)- 

2)  PTi  ,t2  (f  1  j  f 2 ;  can  be  parameterized  by  some  signal  parameters  6  such  that 

PTi,T2(ti,t2;  Wi)  =  PT1,T2(ti,t2;  6) 
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pTl,T2(tl,t2;Wc)) 

Central 

Processor 

Detection:  Mo  or  Mi  ? 
or 

Classification:  Ml,  i=l,...,M 


Fig.  1.  Distributed  detection/classification  system  with  two  sensors 


PT1,T2(tl,t2;Ho)  —  PTi,T2(tl,t2;  0) 

Note  that  since  0  represents  signal  amplitudes,  0/0  under  TL\.  Therefore,  the  detection  problem  is  to 
select  between 


n0: 

0  =  0 

H\  : 

0/0 

T  = 

Ti 

t2 

To  simplify  the  notation,  let 


so  that  we  can  write  the  joint  PDF  PT^Ta^i,  t2;  0)  as  pr(t ;  0)-  With  the  small  signal  assumptions,  it  has 
been  shown  in  [10]  that  by  using  a  first  order  Taylor  expansion  on  the  log-likelihood  function  lnpx(t;  0), 
we  can  construct  the  PDF  of  T  as 


Pt( t;  0)  =  exp  [61 1  -  K(0)  +  In pT(t;  0)] 


(1) 
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where 


K(0)  =  In  Eq  [exp  (0rT) 


(2) 


is  the  cumulant  generating  function  of  px(t;  0),  and  it  normalizes  the  PDF  to  integrate  to  1.  Since  T  is 
a  sufficient  statistic  for  the  constructed  exponential  PDF  in  (1),  this  PDF  incorporates  all  the  information 
from  sensors.  Note  that  only  px(t;  0)  is  required  in  (1)  to  construct  px(t;  0),  and  is  assumed  that  px(t;  0) 
is  available  or  it  can  be  estimated  with  reasonable  accuracy.  Also  note  that  if  Ti,  T2  arc  statistically 
dependent  under  Ho,  they  will  also  be  dependent  under  H\. 

The  next  step  is  to  estimate  the  unknown  parameters  6.  We  resort  to  the  MLE  [8]  by  maximizing  (1) 
over  0.  Note  that  K(0)  is  convex  by  Holder’s  inequality  [11].  Since  maximizing  (1)  is  equivalent  to 
maximizing  0T t  —  K(0),  this  becomes  a  convex  optimization  problem  and  many  existing  methods  can 
be  readily  utilized  [12],  [13].  Also,  the  MLE  of  6  will  satisfy 

dK(6) 


t  = 


do 


(3) 


When  the  MLE  6  is  found,  we  will  use  px(t;  0)  as  our  estimated  PDF  under  H\.  Hence  similar  to  the 
GLRT  [1],  we  will  decide  H\  if 


in  =  eTt  -  K(0 )  >  T 

Pr(t;0)  V 


(4) 


where  r  is  a  threshold.  We  will  show  in  the  next  section  that  px(t;0)  is  asymptotically  the  optimal  in 
the  sense  of  KL  divergence. 

To  extend  our  method  to  classification,  the  above  two  assumptions  can  be  simply  modified  as 

1)  The  signal  is  small  under  each  Hi  and  hence  pxi,T2(ti,  t2;  is  close  to  pxi,T2(ti,  t2;  Wo)- 

2)  Under  each  Hi,  the  joint  PDF  can  be  parameterized  by  some  signal  parameters  0,  so  that 

PT1)T2(tl,t2;  Wj)  =  PT1,T2(tl,t2;  0i) 

PT1,T2(ti,t2;2fo)  =  Px1,T2(ti,t2;0) 


Similar  to  (1),  as  shown  in  [14],  we  can  construct  the  PDF  of  T  under  H,  as 

pT(t;  0i)  =  exp  [0ft  -  K(0i)  +  lnpT(t;  0)]  (5) 

where 

K(0i)  =  In  Eq  [exp  (0f  T)]  (6) 

is  the  cumulant  generating  function  of  px(U  0)  that  normalizes  the  constructed  PDF.  When  the  MLE  of 
6i  is  found  by  maximizing  px(t;  d-i)  over  0t,  we  consider  px(t;  0,  )  as  our  estimate  of  px(t;  Hi)  where 
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6i  is  the  MLE  of  0,.  Hence  similar  to  the  MAP  rule  [1],  we  will  decide  Hi  for  which  the  following  is 
maximum  over  i: 

m. j  u\  PT{t;Oi)p(Hi) 

mAt)  =  PT(t)  =  PT(t)  <7> 

When  we  assume  that  the  prior  probabilities  of  each  candidate  hypothesis,  i.e.,  p(H  \ )  =  ...,  =  p(Hm)  = 
1/M,  p(Hi)  cancels  and  we  can  equivalently  decide  Ht  for  which  the  following  is  maximum  over  i: 


In 


Pr(t;  Oj) 
Pt{  t;0) 


flft  -  K(0i) 


(8) 


IV.  KL  Divergence  Between  The  True  PDF  and  The  Constructed  PDF 


The  KF  divergence  is  a  non-symmetric  measure  of  difference  between  two  PDFs.  For  two  PDFs  p\ 
and  po,  it  is  defined  as 

D{pi\\p0)  =  ( Pi(x)ln^^rfx 
J  Po(x) 

It  is  well  known  that  the  D  (p\  \\po )  >  0  with  equality  if  and  only  if  p\  =  po  almost  everywhere  [15].  By 
Stein’s  lemma  [16],  the  KF  divergence  measures  the  asymptotic  performance  for  detection.  An  extended 
result  to  classification  is  recently  presented  in  [17].  Next  we  will  show  that  pr(t;  &)  is  the  optimal  under 
both  hypotheses.  That  is,  if  it  is  under  Ho,  pr(t;0)  =  pr(t;0)  asymptotically,  and  if  it  is  under  Hi, 
P' r(t;0)  is  asymptotically  the  closest  to  the  true  PDF  in  KF  divergence.  Similar  results  and  arguments 
have  been  shown  in  [7],  [18]. 

Assume  that  we  observe  independent  and  identically  distributed  (IID)  T,’s  with 


T*  = 


Ti4 

T2i 


for  *  =  1, 2, ,  M.  Without  abuse  of  notation,  we  will  write  pt1,t2,...,TmO;i>  t2,  • . . ,  t m  \  0)  as  p(ti,  t2,  •  •  • 
The  constructed  PDF  can  be  easily  extended  as 


p(ti,t2, . . .  ,tM;  0) 


=  exp 


MK(6)+  lnp(ti,t2,  •  •  •  ,tM;  0) 


So  we  want  to  maximize 


1  inp(tlit2’" 

M  np(ti,t2,. 


• ,  t  m;  0) 


M 


,  tM;0)  M 


eT^i~K(0) 


i=  1 


and  6  is  found  by  solving 


dim 

1  80 

i= 1 


(9) 

(10) 


(11) 


,t M)  0)- 


6 

Approved  for  public  release;  distribution  unlimited 


Now  we  consider  two  cases.  First,  if  the  true  PDF  is  under  Ho,  then  by  the  law  of  large  numbers, 


1 

M 


M 

-T  E0( t) 

i— 1 


as  M  — >  oo.  Note  that 


dim , 
de  10=0 


E0(t) 


Since  the  solution  of  (11)  is  unique,  asymptotically  we  have 


6  =  0 


and  hence  p(ti,t2,  ...,tM;0)=  p{  ti,  t2,  -  - .  ,tM;0). 

Secondly,  if  the  true  PDF  is  under  Hi,  then  by  the  law  of  large  numbers, 


1 

M 


M 


->  E^t) 

i= 1 

as  M  — >  oo.  From  (10),  we  arc  asymptotically  maximizing 

6TE1{t)  -  I<{6) 


(12) 


To  avoid  confusion,  we  will  denote  the  underlying  true  PDF  under  H\  as  p{ ti, t2,  •  •  • ,  t/\/;  H i )  and  our 
constructed  PDF  as  p(ti,  t2, . . . ,  t m!  #)■  Since 

p(ti,t2, . . .  ,tM!  Hi) 


In  : 


p(ti,t2, . . .  ,tM;  0) 

/  M 

[6tJ2U-MK(6) 


i=  1 


j  +  ln  P(tj,t2, . . ■  ,tM;Hi) 
y  p(ti,t2,...,tM;0) 


the  KL  divergence  between  the  true  PDF  and  the  constructed  one  is 

D  (p(tr,t2, . . .  ,tM;Hi)||p(ti,t2, . . .  ,tM;  0)) 

=  Ei 


eT  E  t,  -  MK(0) )  +  In  »%&%%> 


i=  1 


=  -M  [6TEl{t)  -  K(6)\ 

+  D  (p(ti,  t2, . . . ,  tM;  Hi)||p(ti,  t2, . . . ,  tM;  o)) 


(13) 


Since  D  (p( ti,  t2, . . . ,  tM;  Hi)|  |p(ti,  t2, . . . ,  tM;  0))  is  fixed,  D  (p( ti,  t2, . . . ,  tM;  Hi) ||p(ti,  t2, . . . ,  t M\0)) 
is  minimized  by  maximizing  (12).  This  shows  that  p(ti,  t2, . . . ,  t m]  0)  is  asymptotically  the  closest  to 
p(ti,t2, . . .  ,tM\H{)  in  KL  divergence. 
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V.  Examples-Distributed  Detection 


In  this  section,  we  compare  our  method  with  the  clairvoyant  GLRT  for  some  detection  problems.  The 
clairvoyant  GLRT  assumes  that  we  know  the  true  PDF  of  T  under  Hi  except  for  the  underlying  unknown 
parameters  a,  and  it  decides  Hi  if 


pT{  t;d) 

In - 7 - - 

Pr(t;0) 


>  T 


(14) 


A.  Partially  Observed  Linear  Model  with  Gaussian  Noise 
Suppose  we  have  the  linear  model  with 

x  =  Ha  +  w 

with 


(15) 


Hq  :  a  =  0 
Hi  :  a  ^  0 


where  x  is  an  N  x  1  vector  of  the  underlying  unobservable  samples,  H  is  an  N  x  p  observation  matrix 
with  full  column  rank,  a  is  an  p  x  1  vector  of  the  unknown  signal  amplitudes,  and  w  is  an  N  x  1  vector 
of  white  Gaussian  noise  with  known  variance  a2.  We  observe  two  sensor  outputs 

T1(x)  =  Hfx 

T2(x)  =  H^x  (16) 


where  Hi  is  N  x  pi  and  H2  is  N  x  p2.  Note  that  [Hi,  H2]  does  not  have  to  be  H.  This  model  is  called 
a  partially  observed  linear  model. 

Let  G  =  [Hi,  H2],  We  assume  that  G  has  full  column  rank  so  that  there  is  no  redundant  measurements 
of  the  sensors.  Then  we  have 


Ti(x) 

Hfx 

.  T2W  . 

1 

X 

E-hcn 

X 

_ i 

Grx 


(17) 


So  T  is  also  Gaussian  and 


T  ~  N(0,a2GTG)  under  Hq 


Let  p  =  pi  +  p2,  and  we  can  see  that  T  is  p  x  1.  As  a  result,  we  construct  the  PDF  as  in  (1)  with 


K{0)  =  In  Eq  [exp  (6>rt)]  =  I a2GTGTGO 
8 


(18) 
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Hence  the  constructed  PDF  is 


Pr(  t;0) 

=  exp  [61 1  —  K(0)  +  lnpx(t;  0)] 


1 


(27rcr2)2  det2  (GTG) 
1 


exp 


tT  (GrG)_1 1' 


2cr2 


•  exp 


eTt  -  -a2oTGTG6 
2 


which  can  be  simplified  as 


T  ~  N  ( ct2GtG6> ,  cj2GtG)  under  Hi 


(19) 


(20) 


Note  that  6  is  the  vector  of  the  unknown  parameters  in  the  constructed  PDF,  and  it  is  different  from  the 
truly  unknown  parameters  a.  From  (3)  and  (18),  the  MLE  of  6  satisfies 

dK{0) 


t  = 


80 


=  azG1G0 


So 


^(G-Gr't 


and  the  test  statistic  becomes 

0Tt  -  K(0)  =  ^ tT  (GTG) _1 1  (21) 

Next  we  consider  the  clairvoyant  GLRT.  That  is  the  GLRT  when  we  know  the  true  PDF  of  T  under 
TL\  except  for  the  truly  underlying  unknown  parameters  a.  It  is  considered  as  the  suboptimal  test  by 
plugging  the  MLE  of  ct  into  the  true  PDF  parameterized  by  ct.  Since  the  constructed  PDF  may  not  be 
the  true  PDF,  the  clairvoyant  GLRT  requires  more  information  than  our  method.  From  (17)  we  know 
that 

T  ~  M  (GrHa,  a2GTG)  under  H\  (22) 


Note  that  (20)  is  the  constructed  PDF  while  (22)  is  the  true  PDF.  We  write  the  true  PDF  under  7i\  as 

px(t;  ct).  The  MLE  of  a  is  found  by  maximizing 

Pr(t;aO 

m - ; - - 

pT(t;0) 

=  0  -  GtHq)T  (GtG)'1  (t  -  GTHa) 
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li  q  <  p,  i.e.,  the  length  of  t  is  less  than  the  length  of  ct.  Then  the  MLE  a  may  not  be  unique  but 
since  (t  —  GrHa)7  (G7  G)  1  (t  —  GTHa)  >  0,  we  could  always  find  a.  such  that  t  =  G7  Ha  and 
hence  (t  —  G3Hd)T  (G7  G)  1  (t  —  GTHa)  =  0.  Hence  the  clairvoyant  GLRT  statistic  becomes 


pT(t;a) 

in - - 

Pr(t;  0) 


(GtG) 


-1 


t 


which  is  the  same  as  our  test  statistic  when  q  <  p. 


B.  Partially  Obsen’ed  Linear  Model  with  Gaussian  Mixture  Noise 

The  partially  observed  linear  model  remains  the  same  as  in  the  previous  subsection  except  for  instead 
of  assuming  that  w  is  white  Gaussian,  we  will  assume  that  w  has  a  Gaussian  mixture  distribution  with 
two  components,  i.e., 

w  ~  7tAA(0,  <j\ I)  +  (1  —  7t)AA(0,  a\ I)  (23) 

where  7 r,  a\  and  rr.7  am  known  (0  <  7r  <  1).  The  following  derivation  can  be  easily  extended  when 
w  ~  Ef=i^(0,afl). 

Since  w  has  a  Gaussian  mixture  disfiibution,  T  =  G7  x  is  also  Gaussian  mixture  distributed  and 
T  ~  vrAT(0,  ct?GtG)  +  (1  -  tt)AA(0,  a\ GTG)  under  H0 

So  we  have 

K{6)  =  In  Eq  [exp  (07t)] 

=  In  (*e&0TGTG 6  +  (1  -  n)e^°TGTG6)  (24) 

Hence  the  constructed  PDF  is 


pt(  t;  0) 


=  exp  t  —  K{6)  +  lnpr(t;  0)] 

[  7T  (  tT{GTG)~1t\  1-7 T 

(27rcrf) 2  def2  (GTG)  \  ^ai  J  2  dets  (GrG) 

•  exp  (0T t)  /  (ne^eTGTG9  +  (1  - 


(  tT  (G'rG)~1  t\l 

V  2^2  )_ 

(25) 


10 

Approved  for  public  release;  distribution  unlimited 


Although  this  constructed  PDF  cannot  be  further  simplified,  we  can  still  find  the  MLE  by  solving 

dK(0 ) 


t  = 


do 


=  I  .  a2 GTGe 

+  (1  -  Try^G-Gd  .  a2GTGd\  ! 


*e&0  GTgO  +  (1  _  ^)e\a%0  G-Gd 


(26) 


Our  test  statistic  is  just 


6  t  —  K(0) 


=  dTt  -  In  (  ire^  GTGd  '  "  gTgQ 


+  (1  —  7r)e2<T2t 


(27) 


where  6  satisfies  (26).  Although  no  analytical  solution  of  the  MLE  of  0  exists,  it  can  be  found  using 
convex  optimization  techniques  [12],  [13].  Moreover,  an  analytical  solution  exists  when  ||0||  — >  0.  To 
see  this,  we  will  show  that 

dK(0) 


lim 


11011-0  96 
where  ./  means  elenrent-by-element  division. 

To  prove  (28),  we  have 


./  (trt?GtG0  +  (1  -  n)a22GTG0)  =  1 


(28) 


lim  |  7re2 
11011-0 


\alOTG-GO  +  (1  _  ^-a^G-Gd)  =  l 


(29) 


and 


lim  Ue&0TGT G°  •  a\GTG6 
l|6>||-oV 

+(1  -  n)e^eTGTG6  •  a22GT GO)  ./ 


(trt?GtG6>  +  (1  -  Tr)a22GTGG) 


=  1 


(30) 


by  L’Hopital’s  rule.  Dividing  (30)  by  (29)  and  from  (26),  (28)  is  proved.  As  a  result  of  (26)  and  (28), 
the  MLE  of  6  satisfies 

t  =  TTcrfGTG6  +  (1  -  tt)o2Gt GO 


as 


|0||  — y  0  and  0  can  be  easily  found  as 


0  = 


ttg\  +  (1  —  7t)<t| 


G  )"1t 


(31) 


11 

Approved  for  public  release;  distribution  unlimited 


Since 

lim  K[0)/  (-TTa2ieTGTGG+  -(1  —  vr)cj|6»rGTG6» 

||0||->o  V2  2 

=  1 

by  using  L’Hopital’s  rule  twice,  as  ||0||  — ►  0,  our  test  statistic  becomes 

eTt  -  Qir<r;eTGTGd  +  1(1  -  ir)al6‘  GTG(A 

^!Ri«lT(GTo)"t 

To  find  the  clairvoyant  GLRT  statistic,  we  know  that  under 

T  ~  GrHa  +  vrAA(0,  ct?GtG)  +  (1  -  vr)AA(0,  a\ GTG)  (32) 

Note  the  difference  between  (25)  and  (32)  since  (25)  is  the  constructed  PDF  and  (32)  is  the  true  PDF. 
The  MLE  of  a  is  found  by  maximizing 

pr(t;a) 

7 r 

”  (27r)9/2  det1/2  (of  GTG) 

T  nT(GtG)_1  t  ' 

(t  -  GTHa)Tl - 2^ —  (t  -  GTHa) 

ai 

1  —  7T 

+  (2ir)q/2  det1/2  [a\ GTG) 

When  q  <  p,  the  MLE  of  a  may  not  be  unique  but  satisfies  t  =  GrHa.  As  a  result,  pr(t;a:)  is  a 
constant  and  the  clairvoyant  GLRT  statistic  becomes 

-lnpT(t;0) 

Note  that  pr(t;0)  is  decreasing  as  t 7  (G7G)  '  t  increases,  the  clairvoyant  GLRT  statistic 

tT  (GTG)  1 1  (33) 

which  is  the  same  as  our  test  statistic  as  ||0||  — ►  0. 

Note  that  the  noise  in  (23)  is  uncorrelated  but  not  independent.  We  consider  a  general  case  when  the 
noise  can  be  correlated  with  PDL 


1  (gtgV 

exp  — (t  —  GTHa)  - 

2  o'o 


(t  -  GTHa) 


W  ~  7tAA(0,  Cl)  +  (1  —  7t)AA(0,  C2) 


(34) 
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TABLE  I 


Comparison  of  our  test  statistic  and  the  clairvoyant  GLRT 


Our  Method 

Clairvoyant  GLRT 

Gaussian  Noise 

tT  (gtg)_1  t 

tT  (GTG)~1t 

Uncorrelated 

Non-Gaussian  Noise 

max  [eTt  -  In  (ne^01^00 
+  (l_7r)ef-220TcVG^j 

tT  (gtg)-1  t 

Correlated 

max[0Tt-ln(^e50iGTClG0 

ln(detl/2(Cl)exP[  §tT(GTCpG)  '*] 

Non-Gaussian  Noise 

0  +(l-7r)et0TGTc2G0)] 

+  detV2(c2)expf  f(GTC2G)  ^1) 

It  can  be  shown  that  similar  to  (27),  our  test  statistic  is 

ft  -  In  (^e\0TGTclGQ  +  (1  _  G-C2G^ 

and  the  clairvoyant  GLRT  statistic  is 


when  q  <  p. 


—  In 


(sA7Eoexp 


(GtCxG) 


+  det1/2(C2)eXP 


(G^C.G)-1 


(35) 


(36) 


C.  Summary 

We  have  considered  a  partially  observed  linear  model  with  both  Gaussian  and  non-Gaussian  noise. 
Table  I  compares  our  test  statistic  with  the  clairvoyant  GLRT. 

In  Gaussian  noise,  w  ~  _A/(0,  cr2I)).  The  test  statistics  are  exactly  the  same. 

In  uncorrelated  non-Gaussian  noise,  w  ~  7rAf(0,  cr2I)  +  (1  —  tt)  A/(0.  rr^l).  The  test  statistics  are  the 
same  as  6  — >  0. 

In  correlated  non-Gaussian  noise,  w  ~  7rAf(0,  Ci)  +  (1  —  n)Af(0,  C2).  Although  we  cannot  show  the 
equivalence  between  these  two  test  statistics,  we  will  see  in  Section  VII  that  their  performances  appear 
to  be  the  same. 


VI.  Examples-Distributed  Classification 

In  this  section,  we  compare  our  method  with  the  estimated  MAP  classifier  for  some  classification 
problems.  The  estimated  MAP  classifier  assumes  that  the  PDF  of  T  under  Hi  is  known  except  for  some 
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unknown  underlying  parameters  a*.  We  assume  equal  prior  probability  of  the  candidate  hypothesis,  i.e., 
p(TL\)  =  •••,=  p(Hm)  =  1/M.  So  the  estimated  MAP  classifier  reduces  to  the  estimated  maximum 
likelihood  classifier  [1],  which  finds  the  MLE  of  q,  and  chooses  Hi  for  which  the  following  is  maximum 
over  v. 

Pt( t;  ati)  (37) 


where  dj  is  the  MLE  of  «, . 


A.  Linear  Model  with  Known  Variance 

Consider  the  following  classification  model: 

Hi  :  x  =  AiSi  +  w  (38) 

where  s*  is  an  N  x  1  known  signal  vector  with  the  same  length  as  x,  At  is  the  unknown  signal  amplitude, 
and  w  is  white  Gaussian  noise  with  known  variance  cr2.  Assume  that  instead  of  observing  x,  we  can 
only  observe  the  measurements  of  two  sensors 

Ti  =  Hfx 

T2  =  Hlx  (39) 

where  Hi  is  N  xpi  and  H2  is  N  xp2.  Here  p\  and  p2  arc  the  length  for  vectors  Ti  and  T2  respectively. 

We  can  write  (39)  as 

T  =  Gtx  (40) 


by  letting 


and 


T  = 


Ti 

T2 


G  =  [Hi  H2] 


where  G  is  N  x  (p\  +  p2)  with  p i  +  />2  <  N.  We  assume  that  G  has  full  column  rank  so  that  there  arc 
no  redundant  measurements  of  the  sensors.  Note  that  G  can  be  any  matrix  with  full  column  rank. 

Let  Ho  be  the  reference  hypothesis  when  there  is  noise  only,  i.e., 


Ho  ■  x  =  w 


(41) 


Since  x  is  Gaussian  under  Ho,  according  to  (40),  T  is  also  Gaussian  and 

T  ~  AA(0,fT2GTG) 
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under  Ho-  We  construct  the  PDF  under  77,  as  in  (1)  with 

K{0i)  =  In  Eq  [exp  (OjT)]  =  l-cj2djGTGOi  (42) 

Hence  the  constructed  PDF  is 


P' r(t;  6i) 

=  exp  [0ft  -  K(0i)  +  lnpT(t;  0)] 

1  ,  (  tT(GTG )~1t 

(2tt(t'2)P  det  *  (GTG)  ^  \  2(j2 

•exp  0ft  —  ^cr20fGTG0,;  (43) 

which  can  be  simplified  as 

T  ~  M  (a2GTG0i,  a2 GtG)  under  77, :  (44) 

The  next  step  is  to  find  the  MLE  of  6L.  Note  that  the  MLE  of  0,  is  found  by  maximizing  0,;Tt  —  I\(07) 
over  0j.  If  this  optimization  procedure  is  earned  without  any  constraint,  then  0,  would  be  the  same  for 
all  i.  Hence  we  need  some  implicit  constraints  in  finding  the  MLE.  Since  0,  represents  the  signal  under 
Hi,  we  should  have 

0i  =  AiGTsi  =  EHi(T)  (45) 

which  is  the  mean  of  T  under  Hi-  As  a  result,  (44)  can  be  written  as 

T  ~  M  ((T2AtGTGGrsi,  cj2GtG)  under  Hi  (46) 


Thus,  instead  of  finding  the  MLE  of  0,  by  maximizing 

0ft  -  K(0i)  =  0ft  -  ^20fGTG0,:  (47) 

with  the  constraint  in  (45),  we  can  find  the  MLE  of  A{  in  (46)  and  then  plug  it  into  (45).  It  can  be  found 


that 

*  _  sfGt 

'  “  cr2sfGGTGGTSj 

and 

n  GT*lsfGt 

1  <72sfGGTGGTSj 

Hence  by  removing  the  constant  factors,  the  test  statistic  of  our  classifier  for  Ht  is 

(sfGt)2 

(■ GTSi)TGTG(GTSi ) 


(48) 

(49) 

(50) 
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according  to  (8). 


Next  we  consider  the  estimate  MAP  classifier.  In  this  case,  we  assume  that  we  know  the  true  PDF 

T  ~  AT  (AtGTSi,  a2GTG)  under  Hi  (51) 


Note  that  (51)  is  the  true  PDF  of  T  under  H,  and  (46)  is  the  constructed  PDF.  It  can  be  found  that  the 
MLE  of  A,  in  the  true  PDF  under  Hi  is 


\  S^G  (GrG)"1 1 
'  sf  G  (G^G)-1  GTs* 

By  removing  the  constant  terms,  the  test  statistic  of  the  estimated  MAP  classifier  for  Hi  is 

(sfG  (G^G)"1!)2 
(GTs*)  (GTG)_1  (GTs*) 


(52) 


(53) 


according  to  (37).  Note  that  (48)  and  (52)  are  different  because  (48)  is  the  MLE  of  A,  under  the  constructed 
PDF  and  (52)  is  the  MLE  of  A,  under  the  true  PDF  Also  note  that  if  G7  G  is  a  scaled  identity  matrix, 
test  statistics  in  (50)  and  (53)  arc  equivalent,  and  hence  our  method  coincides  with  the  estimated  MAP 
classifier. 


B.  Linear  Model  with  Unknown  Variance 

To  extend  the  above  example,  we  consider  the  above  linear  model  with  unknown  noise  variance  a2. 
As  we  have  shown  in  (46),  the  constructed  PDF  is  still 

T  ~  M  (a2AlGTGGTsi,  a2GTG )  under  H,  (54) 

except  for  that  ex2  is  unknown.  Let  Bi  =  cr2A.j,  we  have 

T  ~  M  ( BiGTGGTsu  <j2GtG)  under  Hi  (55) 

Instead  of  finding  the  MLEs  of  A*  and  a2,  we  can  find  the  MLEs  of  II,  and  <r2.  Let  h,  =  G7  GG7  s, 
and  C  =  G7  G.  It  can  be  shown  that 

Bi  =  (hfC^h^^hfC-H  (56) 


and 


d2  = - - - (t  -  hiBifC^it  -  h  iBi) 

Pi  +P2 

By  removing  the  constant  factors,  it  can  also  be  shown  that  the  test  statistic  is  equivalent  to 

tTC-1hi(hTC-1hi)-1h?’C-1t 

tT  [C-t  _  C-ih^hf  C-Ar)-1^ C-1]  t 


(57) 


(58) 
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TABLE  II 


Comparison  of  our  test  statistic  and  the  estimated  MAP  classifier 


Our  Method 

Estimated  MAP 

Known  a2 

(sf  Gt)2 

(sfG(GTG)"it)2 

(GTsi)TGTG(G'rsi) 

(Grs i)  (GTG)_1  (GTSi) 

Unknown  a2 

GC-^gdgfC-^r^C-H 

tu  [c-1  -  C_1hi(hf  C-1h)~1hf  C-1]  t 

tT  [C-1  -  C-1gi(gf  C-1gj)_1gT C_1]  t 

where  h,  =  GrGG'y  s,.  g,  =  G7  s,;  and  C  =  G  7  G. 


Next  we  consider  the  estimated  MAP  classifier.  So  the  true  PDF  is  still 

T  ~  N  (AiGTSi,  (j2GtG)  under  Hi  (59) 

with  unknown  At  and  a2.  Let  gj  =  G7  Sj  and  C  =  GTG.  Similar-  to  (56),  (57)  and  (58),  it  can  be  shown 
that 

A  =  (gfC“1gi)_1gfC“1t  (60) 

<T2  =  — ] — (t  -  g.,;ii)TC_1(t  -  g iAi)  (61) 

Pl+P'2 

and  the  test  statistic  of  the  estimated  MAP  classifier  is 

tTC-1gi(g?,C-1gi)“1hTC-1t 

tT  [c-i  -  c-ig4(gf  c-igij-igf  c-i]  t  1  ; 

Note  that  if  G7  G  is  a  scaled  identity  matrix,  since  h,  =  G7  Gg,,  the  test  statistics  in  (58)  and  (62)  are 
equivalent.  Hence  our  method  is  exactly  the  same  as  the  estimated  MAP  classifier  if  G7  G  is  a  scaled 
identity  matrix. 

C.  Summary 

We  have  considered  a  linear-  model  both  known  and  unknown  noise  variance.  Table  II  compares  our 
test  statistic  with  the  estimated  MAP  classifier.  If  G7  G  is  a  scaled  identity  matrix,  our  method  and  the 
estimated  MAP  classifier  are  exactly  the  same. 

VII.  Simulations 

A.  Distributed  Detection 

Since  our  test  statistic  coincides  with  the  clairvoyant  GLRT  under  Gaussian  noise  as  shown  in  subsec¬ 
tion  V-A,  we  will  only  compare  the  performances  under  non-Gaussian  noise  (both  uncorrelated  noise  as 
in  (23)  and  correlated  noise  as  in  (34)). 
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Consider  the  model  where 


x[n]  =  A\  +  A2rn  +  cos(27r/n  +  4>)  +  w[n] 


(63) 


for  n  =  0, 1, . . . ,  N  —  1  with  known  r  and  frequency  /  but  unknown  amplitudes  A\,  A2,  A3  and  phase 
4>.  This  is  a  linear  model  as  in  (15)  where 

111  0 

1  r  cos(27t/)  sin(27r/) 

H 

i  j’^-1  cos(2irf(N  —  1))  sin(27r/(lV  —  1)) 


and  a.  =  [A\ ,  A2,  A3  cos  cf>,  —  A3  sin  (j)]T . 

Let  w  have  an  uncorrelated  Gaussian  mixture  distribution  as  in  (23).  For  the  partially  observed  linear 
model,  we  observe  two  sensor  outputs  as  in  (16).  We  compare  the  GLRT  in  (27)  with  the  clairvoyant 
GLRT  in  (33).  Note  that  the  MLE  of  6  in  (27)  is  found  numerically,  not  by  the  asymptotic  approximation 
in  (31).  In  the  simulation,  we  use  N  =  20,  A\  =  2,  A2  =  3,  A3  =  4,  (f>  =  7r/4,  r  =  0.95,  /  =  0.34, 
7T  =  0.9,  a\  =  50,  a\  =  500,  and  Hi  and  H2  are  the  first  and  third  columns  in  H  respectively,  i.e., 
Hi  =  [1, 1, . . . ,  1]T,  H2  =  [1,  cos(2vr/), . . . ,  cos(27t/(1V  —  1))]t.  As  shown  in  Figure  2,  the  performances 
are  almost  the  same  which  justifies  their  equivalence  under  small  signals  assumption  shown  in  Section 


V. 


Fig.  2.  ROC  curves  for  the  GLRT  using  the  constructed  PDF  and  the  clairvoyant  GLRT  with  uncorrelated  Gaussian  mixture 
noise. 
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Next  for  the  same  model  in  (63),  let  w  have  a  correlated  Gaussian  mixture  distribution  as  in  (23). 
We  compare  performances  of  the  GLRT  using  the  constructed  PDF  as  in  (35)  and  the  clairvoyant  GLRT 
as  in  (36).  We  use  N  =  20,  Ax  =  3,  A2  =  4,  A3  =  3,  <\>  =  vr/7,  r  =  0.9,  /  =  0.46,  7 r  =  0.7, 
Hi  =  [1, 1, ... ,  1]T,  H2  =  [l,cos(27 r/), . . . , cos(27r f(N  —  l))]r.  The  covariance  matrices  Ci,  C2  are 
generated  using  Ci  =  R.(  x  Ri,  C2  =  R^  x  R2,  where  Ri,  R2  arc  full  rank  N  x  N  matrices.  As 
shown  in  Figure  3,  the  performances  are  still  very  similar. 


Fig.  3.  ROC  curves  for  the  GLRT  using  the  constructed  PDF  and  the  clairvoyant  GLRT  with  correlated  Gaussian  mixture 
noise. 


B.  Distributed  Classification 
For  the  model  in  (38) 

Hi:  x  =  AiSi  +  w 

we  first  consider  a  case  when  G7  G  is  approximately  a  scaled  identity  matrix.  Let  A\  =  0.4,  .4 2  =  1.2, 
A3  =  0.9  and 


si(n)  =  cos(27r/in) 
S2  (jl)  =  COs(27r/27l) 
s3(n)  =  cos(27r/3n) 
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where  n  =  0, 1, . . . ,  N  —  1  with  N  =  25,  and  /i  =  0.14,  /2  =  0.34,  /3  =  0.41.  Let  p{TL i)  =  p(?f2)  = 

p(J~Li)  =  1/3.  Assume  that  there  are  two  sensors,  each  with  an  observation  matrix  as  follows  respectively: 

r  ->T 


H 


l  — 


H 


2  = 


cos  (27t/i(!V  -  1)) 
cos  (2tt/2(1V  —  1)) 


1  T 


1  COs(27r/i)  • 

1  COS(27t/2)  • 

1  cos(27 r/3)  •••  cos  (2irf3(N  -  1))  ^ 

We  use  (50)  and  (53)  as  our  test  statistics  for  the  two  methods  respectively  when  a2  is  known.  Test 
statistics  in  (58)  and  (62)  are  used  when  a2  is  unknown.  The  probabilities  of  correct  classification  are 
plotted  versus  ln(  1  /a2)  in  Figure  4.  We  see  that  our  method  has  the  same  performance  with  the  estimated 


MAP  classifier  with  known  or  unknown  a2,  and  probabilities  of  correct  classification  goes  to  1  as  a2 


0. 


Fig.  4.  Probability  of  correct  classification  for  both  methods. 

Next  we  consider  a  case  when  GTG  is  not  a  scaled  identity  matrix.  Let  A\  =  0.5,  A2  =  1,  A3  =  1 
and 

si(n)  =  cos(27r/in)  +  1 
s2(n)  =  cos(27r/2n)  +  0.5 
s3(n)  =  cos(27r/3n) 

where  n  =  0, 1, . . . ,  N  —  1  with  N  =  20,  and  f\  =  0.17,  /2  =  0.28,  /3  =  0.45.  Let  p(TLi)  =  p(7f2)  = 
p(7f3)  =  1/3.  Assume  that  there  are  three  sensors  (this  is  an  extension  of  the  two  sensor  assumption). 


20 

Approved  for  public  release;  distribution  unlimited 


n  T 


H2  = 

h3  = 


each  with  an  observation  matrix  as  follows  respectively: 

Hi=  [  1  1  •••  1  ]T 

1  COs(27t/i)  -  -  -  COS  (27r/l(JV  —  1)) 

1  cos(27t/2)  •  •  •  cos  ( 2nf2(N  —  1)) 

1  cos  (2tt(/3  +  0.02))  ••• 

cos  (27r(/3  +  0.02)(iV-l))]T 

Note  that  in  H3,  we  set  the  frequency  to  /3  +  0.02.  This  is  the  case  when  the  knowledge  of  the  frequency 
is  not  accurate.  We  also  see  in  Figure  4  that  the  performances  of  both  methods  are  the  same  with  known 
or  unknown  cr2,  and  probabilities  of  correct  classification  goes  to  1  as  a2  — >  0. 


In(1/a2) 


Fig.  5.  Probability  of  correct  classification  for  both  methods. 


VIII.  Conclusions 

A  novel  method  of  constructing  the  joint  PDF  of  the  measurements  from  multiple  sensors  distributed 
systems  has  been  proposed.  Only  a  reference  PDF  is  needed  in  the  construction.  The  constructed  PDF  is 
asymptotically  optimal  in  KL  divergence.  The  performance  of  our  method  has  shown  to  be  as  good  as 
existing  methods  for  both  detection  and  classification,  while  less  information  is  needed  for  our  method. 

References 

[1]  S.  Kay,  Fundamentals  of  Statistical  Signal  Processing:  Detection  Theory.  Englewood  Cliffs,  NJ:  Prentice-Flail,  1998. 


21 

Approved  for  public  release;  distribution  unlimited 


[2]  S.  Thomopoulos,  R.  Viswanathan,  and  D.  Bougoulias,  “Optimal  distributed  decision  fusion,”  IEEE  Trans.  Aerosp.  Electron. 
Syst.,  vol.  25,  pp.  761-765,  Sep.  1989. 

[3]  Z.  Chair  and  P.  Varshney,  “Optimal  data  fusion  in  multiple  sensor  detection  systems,”  IEEE  Trans.  Aerosp.  Electron.  Syst., 
vol.  22,  pp.  98-101,  Jan.  1986. 

[4]  J.  Kittler.  M.  Hatef,  R.  Duin,  and  J.  Matas,  “On  combining  classifiers,”  IEEE  Trans.  Pattern  Anal.  Mach.  Intel!.,  vol.  20, 
pp.  226-239,  Mar.  1998. 

[5]  A.  Sundaresan,  P.  Varshney,  and  N.  Rao,  “Distributed  detection  of  a  nuclear  radioactive  source  using  fusion  of  correlated 
decisions,”  in  Information  Fusion,  2007  10th  International  Conference  on,  2007,  pp.  1-7. 

[6]  S.  Iyengar,  P.  Varshney,  and  T.  Damarla,  “A  parametric  copula  based  framework  for  multimodal  signal  processing,”  in 
ICASSP,  2009,  pp.  1893-1896. 

[7]  S.  Kay  and  Q.  Ding,  “Exponentially  embedded  families  for  multimodal  sensor  processing,”  in  ICASSP,  Mar.  2010,  pp. 
3770-3773. 

[8]  S.  Kay,  Fundamentals  of  Statistical  Signal  Processing:  Estimation  Theory.  Englewood  Cliffs,  NJ:  Prentice-Hall,  1993. 

[9]  S.  Kay,  A.  Nuttall,  and  P.  Baggenstoss,  “Multidimensional  probability  density  function  approximations  for  detection, 
classification,  and  model  order  selection,”  IEEE  Trans.  Signal  Process.,  vol.  49,  pp.  2240-2252,  Oct.  2001. 

[10]  S.  Kay,  Q.  Ding,  and  D.  Emge,  “Joint  pdf  construction  for  sensor  fusion  and  distributed  detection,”  in  International 
Conference  on  Information  Fusion,  Jun.  2010. 

[11]  L.  Brown,  Fundamentals  of  Statistical  Exponential  Families.  Institute  of  Mathematical  Statistics,  1986. 

[12]  S.  Boyd  and  L.Vandenberghe,  Convex  Optimization.  Cambridge  University  Press,  2004. 

[13]  D.  Luenberger,  Linear  and  Nonlinear  Programming,  2nd  ed.  Springer,  2003. 

[14]  S.  Kay.  Q.  Ding,  and  M.  Rangaswamy,  “Sensor  integration  for  classification,”  in  Asilomar  Conference  on  Signals,  Systems, 
and  Computers,  Nov.  2010. 

[15]  S.  Kullback,  Information  Theory  and  Statistics,  2nd  ed.  Courier  Dover  Publications,  1997. 

[16]  T.  Cover  and  J.  Thomas,  Elements  of  Information  Theory,  2nd  ed.  John  Wiley  and  Sons,  2006. 

[17]  M.  Westover,  “Asymptotic  geometry  of  multiple  hypothesis  testing,”  IEEE  Trans.  Inf.  Theory,  vol.  54,  no.  7,  pp.  3327-3329, 
Jul.  2008. 

[18]  S.  Kay,  “Exponentially  embedded  families  -  new  approaches  to  model  order  estimation,”  IEEE  Trans.  Aerosp.  Electron. 
Syst.,  vol.  41,  pp.  333-345,  Jan.  2005. 


22 

Approved  for  public  release;  distribution  unlimited 


